34 research outputs found
Probabilistic Forecasting of Regional Net-load with Conditional Extremes and Gridded NWP
The increasing penetration of embedded renewables makes forecasting net-load,
consumption less embedded generation, a significant and growing challenge. Here
a framework for producing probabilistic forecasts of net-load is proposed with
particular attention given to the tails of predictive distributions, which are
required for managing risk associated with low-probability events. Only small
volumes of data are available in the tails, by definition, so estimation of
predictive models and forecast evaluation requires special attention. We
propose a solution based on a best-in-class load forecasting methodology
adapted for net-load, and model the tails of predictive distributions with the
Generalised Pareto Distribution, allowing its parameters to vary smoothly as
functions of covariates. The resulting forecasts are shown to be calibrated and
sharper than those produced with unconditional tail distributions. In a
use-case inspired evaluation exercise based on reserve setting, the conditional
tails are shown to reduce the overall volume of reserve required to manage a
given risk. Furthermore, they identify periods of high risk not captured by
other methods. The proposed method therefore enables user to both reduce costs
and avoid excess risk
An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods
The challenges posed by complex stochastic models used in computational
ecology, biology and genetics have stimulated the development of approximate
approaches to statistical inference. Here we focus on Synthetic Likelihood
(SL), a procedure that reduces the observed and simulated data to a set of
summary statistics, and quantifies the discrepancy between them through a
synthetic likelihood function. SL requires little tuning, but it relies on the
approximate normality of the summary statistics. We relax this assumption by
proposing a novel, more flexible, density estimator: the Extended Empirical
Saddlepoint approximation. In addition to proving the consistency of SL, under
either the new or the Gaussian density estimator, we illustrate the method
using two examples. One of these is a complex individual-based forest model for
which SL offers one of the few practical possibilities for statistical
inference. The examples show that the new density estimator is able to capture
large departures from normality, while being scalable to high dimensions, and
this in turn leads to more accurate parameter estimates, relative to the
Gaussian alternative. The new density estimator is implemented by the esaddle R
package, which can be found on the Comprehensive R Archive Network (CRAN)
Scalable visualisation methods for modern Generalized Additive Models
In the last two decades the growth of computational resources has made it
possible to handle Generalized Additive Models (GAMs) that formerly were too
costly for serious applications. However, the growth in model complexity has
not been matched by improved visualisations for model development and results
presentation. Motivated by an industrial application in electricity load
forecasting, we identify the areas where the lack of modern visualisation tools
for GAMs is particularly severe, and we address the shortcomings of existing
methods by proposing a set of visual tools that a) are fast enough for
interactive use, b) exploit the additive structure of GAMs, c) scale to large
data sets and d) can be used in conjunction with a wide range of response
distributions. All the new visual methods proposed in this work are implemented
by the mgcViz R package, which can be found on the Comprehensive R Archive
Network
COVID-19 and the difficulty of inferring epidemiological parameters from clinical data
Knowing the infection fatality ratio (IFR) is of crucial importance for
evidence-based epidemic management: for immediate planning; for balancing the
life years saved against the life years lost due to the consequences of
management; and for evaluating the ethical issues associated with the tacit
willingness to pay substantially more for life years lost to the epidemic, than
for those to other diseases. Against this background Verity et al. (2020,
Lancet Infections Diseases) have rapidly assembled case data and used
statistical modelling to infer the IFR for COVID-19. We have attempted an
in-depth statistical review of their approach, to identify to what extent the
data are sufficiently informative about the IFR to play a greater role than the
modelling assumptions, and have tried to identify those assumptions that appear
to play a key role. Given the difficulties with other data sources, we provide
a crude alternative analysis based on the Diamond Princess Cruise ship data and
case data from China, and argue that, given the data problems, modelling of
clinical data to obtain the IFR can only be a stop-gap measure. What is needed
is near direct measurement of epidemic size by PCR and/or antibody testing of
random samples of the at risk population.Comment: Version accepted by the Lancet Infectious Diseases. See previous
version for less terse presentatio
A comparison of inferential methods for highly non-linear state space models in ecology and epidemiology
Highly non-linear, chaotic or near chaotic, dynamic models are important in
fields such as ecology and epidemiology: for example, pest species and diseases
often display highly non-linear dynamics. However, such models are problematic
from the point of view of statistical inference. The defining feature of
chaotic and near chaotic systems is extreme sensitivity to small changes in
system states and parameters, and this can interfere with inference. There are
two main classes of methods for circumventing these difficulties: information
reduction approaches, such as Approximate Bayesian Computation or Synthetic
Likelihood and state space methods, such as Particle Markov chain Monte Carlo,
Iterated Filtering or Parameter Cascading. The purpose of this article is to
compare the methods, in order to reach conclusions about how to approach
inference with such models in practice. We show that neither class of methods
is universally superior to the other. We show that state space methods can
suffer multimodality problems in settings with low process noise or model
mis-specification, leading to bias toward stable dynamics and high process
noise. Information reduction methods avoid this problem but, under the correct
model and with sufficient process noise, state space methods lead to
substantially sharper inference than information reduction methods. More
practically, there are also differences in the tuning requirements of different
methods. Our overall conclusion is that model development and checking should
probably be performed using an information reduction method with low tuning
requirements, while for final inference it is likely to be better to switch to
a state space method, checking results against the information reduction
approach
qgam: Bayesian non-parametric quantile regression modelling in R
Generalized additive models (GAMs) are flexible non-linear regression models,
which can be fitted efficiently using the approximate Bayesian methods provided
by the mgcv R package. While the GAM methods provided by mgcv are based on the
assumption that the response distribution is modelled parametrically, here we
discuss more flexible methods that do not entail any parametric assumption. In
particular, this article introduces the qgam package, which is an extension of
mgcv providing fast calibrated Bayesian methods for fitting quantile GAMs
(QGAMs) in R. QGAMs are based on a smooth version of the pinball loss of
Koenker (2005), rather than on a likelihood function, hence jointly achieving
satisfactory accuracy of the quantile point estimates and coverage of the
corresponding credible intervals requires adopting the specialized Bayesian
fitting framework of Fasiolo, Wood, Zaffran, Nedellec, and Goude (2020b). Here
we detail how this framework is implemented in qgam and we provide examples
illustrating how the package should be used in practice